Skip to content

Conversation

@akshatvishu
Copy link
Contributor

@akshatvishu akshatvishu commented Nov 4, 2025

Closes #8996

Description:

The dspy.Audio.from_file (and from_url) method relies on Python's mimetypes.guess_type() to determine the audio format. On some operating systems, this function can return non-standard MIME types, such as audio/x-wav for .wav files.

These non-standard format strings, often prefixed with x- (like x-wav or x-m4a), are then passed to the LLM API (e.g., OpenAI). This can cause a 400 BadRequestError, as the API typically only accepts compliant formats (e.g., wav, m4a).

This patch adds a check to from_file, from_url, and the data URI branch of encode_audio to normalize these formats by removing any x- prefix, ensuring an API-compliant format is always sent.

@TomeHirata
Copy link
Collaborator

Thanks @akshatvishu, can you add a unit test?

@akshatvishu
Copy link
Contributor Author

akshatvishu commented Nov 6, 2025

@TomeHirata Added the unit-test and also I slightly changed the logic and used the removeprefix() instead of the replace() to safely remove only the prefix from audio format strings, preventing unintended replacements if "x-" appears elsewhere in the format.

Copy link
Collaborator

@TomeHirata TomeHirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TomeHirata TomeHirata merged commit b3c6350 into stanfordnlp:main Nov 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] LiteLLM exception when using OpenAI Speech-To-Text models

2 participants